Model Selection with Nonlinear Embedding for Unsupervised Domain Adaptation

نویسندگان

  • Hemanth Venkateswara
  • Shayok Chakraborty
  • Troy L. McDaniel
  • Sethuraman Panchanathan
چکیده

Domain adaptation deals with adapting classifiers trained on data from a source distribution, to work effectively on data from a target distribution. In this paper, we introduce the Nonlinear Embedding Transform (NET) for unsupervised domain adaptation. The NET reduces cross-domain disparity through nonlinear domain alignment. It also embeds the domainaligned data such that similar data points are clustered together. This results in enhanced classification. To determine the parameters in the NET model (and in other unsupervised domain adaptation models), we introduce a validation procedure by sampling source data points that are similar in distribution to the target data. We test the NET and the validation procedure using popular image datasets and compare the classification results across competitive procedures for unsupervised domain adaptation. Introduction There are large volumes of unlabeled data available online, owing to the exponential increase in the number of images and videos uploaded online. It would be easy to obtain labeled data if trained classifiers could predict the labels for unlabeled data. However, classifier models do not perform well when applied to unlabeled data from different distributions, owing to domain-shift (Torralba and Efros 2011). Domain adaptation deals with adapting classifiers trained on data from a source distribution, to work effectively on data from a target distribution (Pan and Yang 2010). Some domain adaptation techniques assume the presence of a few labels for the target data, to assist in training a domain adaptive classifier (Aytar and Zisserman 2011; Duan, Tsang, and Xu 2012; Hoffman et al. 2013). However, real world applications need not support labeled data in the target domain and adaptation here is termed as unsupervised domain adaptation. Many of the unsupervised domain adaptation techniques can be organized into linear and nonlinear procedures, based on how the data is handled by the domain adaptation model. A linear domain adaptation model performs linear transformations on the data to align the source and target domains or, it trains an adaptive linear classifier for both the domains; for example a linear SVM Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Figure 1: (Best viewed in color) Two-moon binary classification problem with source data in blue and target data in red. We assume the target labels are unknown. (a) Original data, (b) KPCA aligns the data along nonlinear directions of maximum variance, (c) MMD aligns the two domains, (d) MMD+Similarity-based Embedding aligns the domains and clusters the data to ensure easy classification. (Bruzzone and Marconcini 2010). Nonlinear techniques are deployed in situations where the source and target domains cannot be aligned using linear transformations. These techniques apply nonlinear transformations on the source and target data in order to align them. For example, Maximum Mean Discrepancy (MMD) is applied to learn nonlinear representations, where the difference between the source and target distributions is minimized (Pan et al. 2011). Even though nonlinear transformations may align the domains, the resulting data may not be conducive to classification. If, after domain alignment, the data were to be clustered based on similarity, it can lead to effective classification. We demonstrate this intuition through a binary classification problem using a toy dataset. Figure (1a), displays the source and target domains of a two-moon dataset. Figure (1b), depicts the transformed data after KPCA (nonlinear projection). In trying to project the data onto a common ‘subspace’, the source data gets dispersed. Figure (1c), presents the data after domain alignment using MaximumMean Discrepancy (MMD). Although the domains are now aligned, it does not necessarily ensure enhanced classification. Figure (1d), shows the data after MMD and similarity-based embedding, where data is clustered based on class label similarity. Cross-domain alignment along with similarity-based embedding, makes the data classification friendly. In this work, we the present the Nonlinear Embedding Transform (NET) procedure for unsupervised domain adaptation. The NET performs a nonlinear transformation to align the source and target domains and also cluster the data based on label-similarity. The NET algorithm is a spectral (eigen) technique that requires certain parameters (like number of eigen bases, etc.) to be pre-determined. These parameters are often given random values which need not be optimal (Pan et al. 2011; Long et al. 2013; Long et al. 2014). In this work, we also outline a validation procedure to finetune model parameters with a validation set created from the source data. In the following, we outline the two main contributions in our work: • Nonlinear embedding transform (NET) algorithm for unsupervised domain adaptation. • Validation procedure to estimate optimal parameters for an unsupervised domain adaptation algorithm. We evaluate the validation procedure and the NET algorithm using 7 popular domain adaptation image datasets, including object, face, facial expression and digit recognition datasets. We conduct 50 different domain adaptation experiments to compare the proposed techniques with existing competitive procedures for unsupervised domain adaptation. Related Work For the purpose of this paper, we discuss the relevant literature under the categories linear domain adaptation methods and nonlinear domain adaptation methods. A detailed survey on transfer learning procedures can be found in (Pan and Yang 2010). A survey of domain adaptation techniques for vision data is provided by (Patel et al. 2015). The Domain Adaptive SVM (DASVM) (Bruzzone and Marconcini 2010), is an unsupervised method that iteratively adapts a linear SVM from the source to the target. In recent years, the popular unsupervised linear domain adaptation procedures are Subspace Alignment (SA) (Fernando et al. 2013), and the Correlation Alignment (CA) (Sun, Feng, and Saenko 2015). The SA algorithm determines a linear transformation to project the source and target to a common subspace, where the domain disparity is minimized. The CA is an interesting technique which argues that aligning the correlation matrices of the source and target data is sufficient to reduce domain disparity. Both the SA and CA are linear procedures, whereas the NET is a nonlinear method. Although deep learning procedures are inherently highly nonlinear, we limit the scope of our work to nonlinear transformation of data that usually involves a positive semidefinite kernel function. Such procedures are closely related to the NET. However, in our experiments, we do study the NET with deep features also. The Geodesic Flow Kernel (GFK) (Gong et al. 2012), is a popular domain adaptation method, where the subspace spanning the source data is gradually transformed into the target subspace along a path on the Grassmann manifold of subspaces. Spectral procedures like the Transfer Component Analysis (TCA) (Pan et al. 2011), the Joint Distribution Alignment (JDA) (Long et al. 2013) and Transfer Joint Matching (TJM) (Long et al. 2014), are the most closely related techniques to the NET. All of these procedures involve a solution to a generalized eigen-value problem in order to determine a projection matrix to nonlinearly align the source and target data. In these spectral methods, domain alignment is implemented using variants of MMD, which was first introduced in the TCA procedure. JDA introduces joint distribution alignment which is an improvement over TCA that only incorporates marginal distribution alignment. The TJM performs domain alignment along with instance selection by sampling only relevant source data points. In addition to domain alignment with MMD, the NET algorithm implements similarity-based embedding for enhanced classification. We also introduce a validation procedure to estimate the model parameters for unsupervised domain adaptation approaches. Domain Adaptation With Nonlinear Embedding In this section, we first outline the NET algorithm for unsupervised domain adaptation. We then describe a crossvalidation procedure that is used to estimate the model parameters for the NET algorithm. We begin with the problem definition where we consider two domains; source domain S and target domain T . Let Ds = {(x s i , y s i )} ns i=1 ⊂ S be a subset of the source domain and Dt = {(x t i, y t i)} nt i=1 ⊂ T be the subset of the target domain. Let XS = [x s 1, . . . ,x s ns ] ∈ Rs and XT = [x1, . . . ,x t nt ] ∈ Rt be the source and target data points respectively. Let YS = [y s 1, . . . , y s ns ] and YT = [y t 1, . . . , y t nt ] be the source and target labels respectively. Here, xi and x t i ∈ R are data points and y i and y t i ∈ {1, . . . , C} are the associated labels. We define X := [x1, . . . ,xn] = [XS,XT ], where n = ns+nt. The problem of domain adaptation deals with the situation where the joint distributions for the source and target domains are different, i.e. PS(X,Y ) 6= PT (X,Y ), whereX and Y denote random variables for data points and labels respectively. In the case of unsupervised domain adaptation, the labels YT are unknown. The goal of unsupervised domain adaptation is to estimate the labels of the target data ŶT = [ŷ t 1, . . . , ŷ t nt ] corresponding to XT using Ds andXT . Nonlinear Domain Alignment A common procedure to align two datasets is to first project them to a common subspace. Kernel-PCA (KPCA) estimates a nonlinear basis for such a projection. In this case, data is internally mapped to a high-dimensional (possibly infinite-dimensional) space defined by Φ(X) = [φ(x1), . . . , φ(xn)]. φ : R d → H is the mapping function and H is a RKHS (Reproducing Kernel Hilbert Space). The dot product between the mapped vectors φ(x) and φ(y), is estimated by a positive semi-definite (psd) kernel, k(x,y) = φ(x)φ(y). The dot product captures the similarity between x and y. The kernel similarity (gram) matrix consisting of similarities between all the data points in X, is given by, K = Φ(X)Φ(X) ∈ R. The matrix K is used to determine the projection matrix A, by solving, max A⊤A=I tr(AKHKA). (1) Here,H is the n×n centering matrix given byH = I− 1 n 1, where I is an identity matrix and 1 is a n × n matrix of 1s. A ∈ R, is the matrix of coefficients and the nonlinear projected data is given by Z = [z1, . . . , zn] = A ⊤ K ∈ R . Along with projecting the source and target data to a common subspace, the domain-disparity between the two datasets must also be reduced. We employ the Maximum Mean Discrepancy (MMD) (Gretton et al. 2009), which is a standard nonparametric measure to estimate domain disparity. We adopt the Joint Distribution Adaptation (JDA) (Long et al. 2013), algorithm which seeks to align both the the marginal and conditional probability distributions of the projected data. The marginal distributions are aligned by estimating the coefficient matrixA, which minimizes:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonlinear Embedding Transform for Unsupervised Domain Adaptation

The problem of domain adaptation (DA) deals with adapting classifier models trained on one data distribution to different data distributions. In this paper, we introduce the Nonlinear Embedding Transform (NET) for unsupervised DA by combining domain alignment along with similarity-based embedding. We also introduce a validation procedure to estimate the model parameters for the NET algorithm us...

متن کامل

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

Solving nonlinear Lane-Emden type equations with unsupervised combined artificial neural networks

In this paper we propose a method for solving some well-known classes of Lane-Emden type equations which are nonlinear ordinary differential equations on the semi-innite domain. The proposed approach is based on an Unsupervised Combined Articial Neural Networks (UCANN) method. Firstly, The trial solutions of the differential equations are written in the form of feed-forward neural networks cont...

متن کامل

Image alignment via kernelized feature learning

Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...

متن کامل

Distribution-Matching Embedding for Visual Domain Adaptation

Domain-invariant representations are key to addressing the domain shift problem where the training and test examples follow different distributions. Existing techniques that have attempted to match the distributions of the source and target domains typically compare these distributions in the original feature space. This space, however, may not be directly suitable for such a comparison, since ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1706.07527  شماره 

صفحات  -

تاریخ انتشار 2017